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The  Five  Color  Concurrency  Control  Protocol: 
Non-Two-Phase  Locking  in  General  Database 

Partha  Dasgupla  and  Zvi  M.  Kedem 
Abstract 


Concurrency  control  protocols  that  are  based  on  2-phase  locking  are 
a  populair  family  of  locking  protocols  that  preserved  seriaiizability  in 
general  (unstructured^  database  systems.  This  paper  presents  a  con- 
currency control  algorithm  (for  databases  with  no  inherent  structure) 
that  is  practical,  non-2-phasc  and  allows  varieties  of  serializable  logs 
not  possible  with  any  commonly  known  locking  schemes.  The  proto- 
col achieves  high  concurrency  by  anticipating  the  existence  (or 
absence)  of  possible  conflicts  using  information  about  transaction 
read  and  write  sets.  It  is  well  known  that  seriaiizability  is  character- 
ized by  acyclicity  of  the  conflict  graph  representation  of  interleaved 
executions.  The  2-phase  locking  protocols  allow  onl'^  forward  growth 
of  the  paths  in  the  graph.  The  Five  Color  protocol  allows  the  con- 
flict graph  to  grow  in  any  direction  (avoiding  2-phasc  constraints) 
and  prevents  cycles  in  the  graph  by  maintaining  transaction  access 
information  in  the  form  of  data-item  markers.  The  read  and  write  set 
information  can  also  be  used  to  provide  relative  immunity  from 
deadlocks.  This  protocol  allows  higher  concurrency  and  lower 
deadlock  frequencies  than  2-phase  locking,  according  to  our  simula- 
tion studies. 


1   Introduction. 

This  paper  presents  a  concurrency  control  mechanism  that  uses  five  kinds  of 
locks.  Unlike  the  2-phase  locking  protocol,  the  Five  Color  protocol  uses  early 
release  of  locks  to  enhance  concurrency.  The  early  release  of  locks  causes  the  proto- 
col to  be  non-two  phase  in  its  locking  behavior.  It  has  been  shown  that  2-phase  lock- 
ing is  a  necessary  condition  for  seriaiizability  in  general  databases.  However  we 
show  how  seriaiizability  can  be  achieved  using  a  non-two  phase  protocol  by  addition 
of  a  validation  phase.  The  Five  Color  protocol  does  not  assume  any  inherent  struc- 
tures in  the  database.  We  first  present  a  brief  introduction  to  the  to  the  concepts  of 
seriaiizability,  the  model  of  a  multiuser  database,  the  factors  that  limit  concurrency 
in  2-phase  locking  and  some  related  work.  Section  2  contains  a  comprehensive 
description  of  the  Five  Color  protocol,  including  an  intuitive  description  of  how  it 
functions  and  why  it  ensures  seriaiizability.  Section  3  explains  the  formal  properties 
of  the   protocol  and  derives  a  proof  of  correctness.   Sections  5  and  6  deal  with 


deadlocks  and  livelocks  and  Section  7  compares  the  efficiency  of  this  protocol  to 
other  well  known  protocols,  using  both  intuitive  reasoning  and  simulation  results. 

1.1  Serializability. 

A  database  is  viewed  as  a  collection  of  data  items,  which  can  be  read  or  written 
by  concurrent  transactions.  Interleaving  of  updates  can  leave  the  database  in  an 
inconsistent  state.  A  sufficient  condition  to  guarantee  correctness  of  concurrent 
database  access  is  serializability  of  the  actions  (reads  or  writes)  performed  by  the 
transactions  on  the  data  items.  That  is,  the  interleaved  execution  of  the  transactions 
should  be  equivalent  to  any  serial  execution  of  the  transactions  [BeGoSO.  RoS- 
tLe78].  In  this  paper,  we  assume  serializability  to  be  the  criterion  of  correctness. 

Locking  of  data  items  is  one  of  the  methods  of  achieving  consistency  in  the  face 
of  concurrent  updates.  For  databases  with  no  inherent  structure  (e.g.  databases  not 
organized  as  DAG's,  trees,  etc.  the  2-phase  locking  protocol  is  the  most  popular 
locking  protocol.  However,  2-phase  locking  is  restrictive  with  respect  to  the  amount 
of  concurrency  it  allows. 

Informally,  a  log  is  a  sequence  of  actions  issued  by  various  transactions  on 
several  data  items.  The  transaction  actions  may  be  interleaved  with  one  another. 
Serializability  is  a  syntactic  property  of  a  log.  It  has  been  shown  that  recognizing 
serializability  is  an  N'P-complete  problem  [BeShWo79,  Pa79].  The  NP-completeness 
of  the  serializability  recognition  problem  implies  that  we  cannot  have  a  scheduler 
that  allows  all  serializable  logs  and  disallows  non-serializable  ones,  and  works  in 
polynomial  time  (unless  P  =  NP).  However  certain  subclasses  of  serializable  logs 
are  efficiently  recognizable  in  polynomial  time.  Efficient  algorithms  can  be  built  that 
control  the  actions  of  transactions,  to  ensure  that  the  logs  produced  by  a  set  of  tran- 
sactions fall  into  one  of  these  easily  recognizable  classes  of  serializability.  The  2- 
phase  locking  protocol  is  one  such  algorithm  which  produces  a  class  of  polynomially 
recognizable  serializable  logs,  namely  the  2-phase  locked  logs. 

1.2  The  Model. 

A  database  D  is  a  set  of  distinct  items  {x  x,,  ...  ,x  }.  A  transaction  sxstem  T 
is  a  set  of  transactions  {T,,  T2,  ■■•  ,  T  }  that  operate  on  the  database.  The  readset 
(writeset)  of  a  transaction  T.  is  the  set  of  all  items  T.  reads  (writes). 

A  transaction  that  intends  to  read  (or  write)  a  data  item  x,  issues  a  read  (or 
write)  request  to  the  transaction  manager.  The  transaction  manager  is  responsible 
for  determining  whether  or  not  granting  of  the  request  may  cause  a  violation  of  the 
correctness  criterion  (generally  serializability).  The  transaction  manager  then  takes 
appropriate  action  by  grantmg,  rejecting  or  delaying  the  request. 


A  trace  of  a  transaction  is  a  sequence  of  read  and  write  requests  it  makes  to  the 
transaction  manager.  A  history  is  written  as  a  sequence  of  actions  of  the  form  R  (x) 

or  W.(x),  where  R^(x)  (or  W^(x))  means  a  transaction  T.  issues  a  read  (write)  on 
data  item  x.  Note  that  we  are  not  interested  in  the  values  read  or  written,  but  in  the 
syntactic  properties  of  the  string  that  lists  the  sequence  of  reads  and  writes  on  the 
data  items. 

We  will  assume  at  most  one  read  and  at  most  one  write  per  transaction  per 
data  item  in  any  trace.  If  a  transaction  reads  as  well  as  writes  a  particular  data  item, 
we  assume  the  read  will  precede  the  write.  Multiple  reads  and  writes  are  handled  in 
an  obvious  way:  The  first  read  is  used  to  read  the  value  of  the  data  item  and  store 
it  in  local  storage,  and  the  other  reads  on  the  same  data  item  are  processed  locally. 
Similarly,  all  writes  except  the  last  one  are  written  to  local  storage,  and  the  last  one 
appears  on  the  log.  Thus  there  is  no  loss  of  generality. 

A  log  of  a  transaction  manager  is  a  sequence  of  reads  and  writes  granted  by  the 
transaction  manager.  As  an  example,  three  transaction  traces  and  one  possible  tran- 
saction manager  log  are  depicted  below:  (the  notation  is  from  Bernstein  et  al. 
[BeGoSO].) 

T^:    R^(x)  R^(y)  W^(y) 

T^  :    R2(y)  W,(y) 

T3:    R3(.x)  RjCy)  R^(z)  W3(x) 

Log  :  R^(x)  R2(y)  W^Cy)  R3(x)  R^(y)  R3(yj  w^(y)  R3(z)  W3(x) 

Note  that  when  A^(x)  precedes  A^'(x')  in  some  transaction  trace  then  A  (x)  has  to 
precede  A^'(x')  in  any  log  in  which  the  transaction  T,  participates. 


1.3   Increasing  Concurrency. 

In  database  concurrency  control  we  are  interested  in  protocols  that  maximize 
concurrency  and  work  efficiently.  However  the  2-phase  locking  protocols  may  not 
score  high  in  the  concurrency  criterion.  2-phase  locking  restricts  the  logs  to  a  small 
subset  of  serializable  logs.  The  restrictive  nature  of  2-phase  locking  is  due  to  the 
fact  that,  it  does  not  assume  any  a-priori  knowledge  of  the  intentions  of  the  the  tran- 
sactions. Further,  a  2-phase  locking  protocol  can  cause  deadlocks,  which  further 
degrade  its  performance. 

\  2-phase  locking  protocol  has  to  lock  all  the  necessary  data  items  before  it 
can    unlock   any.     This    is    a   strong   constraint   on    the    locking   sequences   that   a 
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transaction  may  use,  and  may  reduces  concurrency  m  some  cases.  As  a  trivial  exam- 
ple, consider  the  following  transaction  histories: 

T^:    R^(x),    W^(x),  R^(y).  W^(y) 

In  this  case,  T-,  (and  T-,)  reads  the  value  of  x  (and  y),  and  does  nothing  else. 
Thus  T,  and  T-,  can  read  x  (or  y)  between  any  two  actions  of  T, ,  and  still  produce  a 
correct  serializable  execution  sequence.  However  if  2-phase  locking  is  used,  then  T2 
(or  T-y)  is  restricted  to  read  x  (or  y)  at  only  certain  points,  depending  upon  the  lock- 
ing sequence.  For  example  suppose  T,  uses  the  following  locking  sequence:  (LS(x) 
denotes  setting  a  shared  lock  on  x  LX(x)  denotes  setting  an  exclusive  lock  on  x,  and 
U(x)  denotes  unlocking  of  x.) 


LSj(x) 


Rl(x) 

LXj(x) 

W^(x) 

LSj(y) 

T 

Rjiy) 

7,  cannot  read  x  here 

LXj(y) 

i 

Uj(x) 

W^(y) 
Uj(y) 

Trace  of  Transaction  T,. 

This  locking  sequence  prevents  Tj  from  reading  x,  between  W.(x)  and  Ri(y). 
However  T.'s  locking  sequence  can  be  changed  so  that  T-,  can  read  x  between 
W.(x)  and  R  ,(y),  but  then  T.,  will  not  be  able  to  read  y  between  W.(x)  and  R  ,(y). 

Since  we  know  that  T^  and  T,  read  x  and  y  and  do  nothing  else,  we  could  state 
that  T-,  and  T,  can  read  x  and  y  interleaved  between  any  steps  of  T, .  The  situation 
would  be  different  if  T-,  or  T-,  accessed  or  updated  some  other  data  items  after 
reading  x  or  y.  However  the  2-phase  locking  protocol  does  not  rely  upon,  or  have 
access  to,  information  about  the  data  access  patterns  of  the  transactions.    We  will 
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show  how  data  access  information  can  be  used  by  concurrency  control  protocols  to 
deal  with  situations  like  the  one  shown  above. 

The  above  example  also  shows  that  locking  sequences  of  2-phase  locked  tran- 
sactions affect  the  amount  of  concurrency,  depending  on  the  transaction  mix.  A  par- 
ticular locking  sequence  in  fact  may  favor  one  transaction  over  another.  As  there  is 
no  way  to  predict  which  transactions  will  run  concurrently,  it  may  not  be  easy  to 
choose  locking  sequences.  In  fact,  it  is  commonly  believed  that,  locking  should  not 
be  handled  by  transactions  or  application  programs,  but  should  be  the  responsibility 
of  lower  level,  consistency  preserving  routines  i.e.  the  transaction  manager. 

In  order  to  make  locking  transparent,  practical  2-phase  locking  schemes  use 
read  and  write  requests  to  obtain  locks.  .\  read  or  write  request  on  an  unlocked  data 
item  causes  the  lock  to  be  obtained.  All  locks  are  released  only  when  the  transaction 
terminates.  Thus  it  is  intuitively  clear  that  the  locks  are  held  for  extended  periods. 

We  propose  a  concurrency  control  protocol  that,  in  cases  like  the  above,  would 
allow  Tj  and  T.,  to  read  x  and  y  interleaved  between  any  action  of  T^.  The  locking 
would  be  handled  entirely  by  the  transaction  manager.  The  protocol  is  inherently 
non-2-phase,  and  is  relatively  immune  from  deadlocks. 

1.4  Related  Work 

It  has  been  shown  that  some  available  information  about  the  transactions  or  the 
database  can  be  used  for  increasing  concurrency.  For  instance  the  tree  and  DAG 
(directed  acyclic  graph)  protocols  can  be  used  on  databases  structured  like  a  tree  or 
a  DAG,  respectively  [SiKeSO,  KeSi82].  These  protocols  allow  non-2-phase  locking, 
and  provide  higher  concurrency  than  2-phase  locking  for  transactions  that  traverse 
the  tree  or  the  DAG.  In  the  DAG  protocol  a  transaction  is  allowed  to  lock  a  child  if 
it  has  locks  on  the  majority  of  its  parents  (except  for  the  first  node  locked),  and 
unlocking  may  be  done  in  any  order.  Thus  the  transaction  can  access  only  those  data 
items  that  form  a  rooted  subgraph  of  the  original  graph.  These  and  similar  protocols 
can  be  used  in  specialized  applications  and  has  practical  limitations,  but  has  received 
substantial  theoretical  interest. 

If  the  writeset  of  a  transaction  is  known  in  advance,  timestamp  protocols  can  be 
made  abort  free  (or  progressive)  [BuSi83].  This  can  provide  significant  improvement 
in  performance  as  aborts  cause  severe  limitations  of  throughput  in  timestamp  based 
systems. 

Semantic  knowledge  about  transaction  actions  has  also  been  used  to  speed  up 
transaction  processing  [Ga83].  A  partial  loss  of  serializability  can  be  tolerated  in 
some  applications  and  can  be  used  for  better  performance  [FiMi82].  This  may  not 
be  of  interest  in  general  purpose  databases,  where  consistency  is  a  major  issue. 
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Our  approach  involves  knowing  in  advance  supersets  of  the  readset  and  the 
writeset  of  the  transactions.  Knowing  the  exact  readset  (or  wnteset)  of  a  transaction 
IS  better  but  not  always  feasible.  However  a  superset  of  the  readset  (or  writeset) 
can  often  be  statically  determined.  We  will  assume  this  is  possible,  and  demonstrate 
how  this  information  can  be  used  to  achieve  higher  concurrency. 

Read  and  write  sets  are  used  to  control  concurrency  in  SDD-1  (A  System  for 
Distributed  Databases)  [BeShRoSO].  Transactions  are  divided  into  classes  depend- 
ing upon  their  read  and  write  sets  and  then  conflict  graph  analysis  is  performed. 
This  is  a  static  classification  and  is  used  to  determine  the  nature  of  the  conflicts.  The 
conflict  type  is  used  to  determme  which  protocol  to  use.  (SDD-1  uses  different  pro- 
tocols for  different  situations.)  SDD-1  is  a  timestamp  based  system.  In  our 
approach  the  usage  of  read  and  write  set  information  is  dynamic.  The  protocol  uses 
locking  and  static  analysis  of  conflicting  transactions  is  not  performed. 

2   The  Five  Color  Protocol. 

The  Five  Color  Protocol  is  a  non-two-phase  locked  protocol  that  ensures  serial- 
izability  in  general  (unstructured)  databases.  It  derives  its  name  from  the  five  types 
of  locks  it  uses. 

A  transaction  T  acts  upon  a  set  of  data  items  D.  A  data  item  x  €  D  is  in  the 
readset  (Rd)  of  a  transaction  T  (  that  is  .x  ?  Rd(T)  )  if  the  transaction  does  a  read 
operation  on  x.  Any  item  written  by  the  transaction  T  is  contained  in  the  writeset 
(Wr)  of  T.  Note  that  neither  Rd(T)  need  be  a  subset  of  Wr(T)  nor  vice  versa,  and 
Rd(T)  and  Wr(T)  may  be  supersets  of  the  data  items  actually  read  and  written  by 
the  transaction  T. 

Each  transaction  is  required  to  declare  its  readset  and  writeset  to  the  transac- 
tion manager  before  it  issues  any  actions.  The  read  and  write  sets  can  be  parameters 
to  the  be  gin-transaction  statement  that  is  executed  by  a  transaction  when  it  starts. 
Since  the  readset  (and  writeset)  are  allowed  to  be  supersets  of  the  data  items  actu- 
ally read  (and  written),  they  could  be  statically  determined  during  query  compila- 
tion. Sometimes  the  transaction  may  read  and  write  different  sets  of  data  depending 
upon  some  statically  undeterminable  conditions.  In  this  case  the  declared  read  and 
write  sets  should  include  all  the  items  the  transaction  may  act  upon.  However,  for 
better  performance  the  difference  between  the  declared  and  actual  read  and  write 
sets  should  be  small. 

2.1   The  Basic  Algorithm. 

After  the  transaction  manager  knows  the  read  and  write  sets  of  the  transaction, 
it  can  obtain  shared  locks  on  all  the  data  items  in  the  readset,  read  them  and  store 
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the  values  in  local  storage.  Then  we  would  like  to  release  the  shared  locks,  as  the 
locks  seem  no  longer  necessary.  .Also,  we  would  also  like  to  keep  the  data  items  in 
the  writeset  locked  by  a  shared  lock  while  the  transaction  is  running,  to  prevent 
other  transactions  from  updating  them  and  causing  missing  updates.  The  shared  lock 
on  the  writeset  can  then  be  upgraded  to  exclusive  locks  at  commit  time,  and  the 
values  actually  updated. 

Thus  our  protocol  is  along  the  following  lines: 

•  Get  shared  locks  on  the  read  and  write  sets. 

•  Read  the  readset  into  local  storage  and  release  the  locks  on  the  readonly 
items. 

•  Service  the  reads  and  writes   issued  by  the  transaction  from  and  to   local 
storage. 

•  Upgrade  the  shared  locks  on  the  writeset  to  exclusive  locks  and  perform 
actual  writes  to  the  database  during  commit. 

The  merits  of  this  protocol  are  the  early  release  of  shared  locks  and  short  hold- 
ing of  exclusive  locks.  The  protocol  is  obviously  non-2-phase,  as  some  shared  locks 
are  released  before  some  other  shared  locks  are  upgraded  to  exclusive  locks.  The 
problem  with  this  initial  scheme  is  that  it  can  produce  non-serializable  schedules, 
and  thus  is  unacceptable. 

Now  our  aim  is  to  use  this  basic  idea,  as  described  above,  and  add  some  checks 
to  ensure  serializability.  We  show  that  this  is  possible  if  we  use  some  marker  locks 
and  a  validation  phase.  The  algorithms  used  to  handle  the  marker  locks  are  non- 
trivial  and  are  described  in  detail  in  the  following  sections. 

2.2   Locking. 

The  Five  Color  Protocol  uses  five  types  of  locks.  White  (WL),  Blue  (BL),  Green 
(GL),  Yellow  (YD  and  Red  (RL).  The  White  and  Blue  locks  are  the  marker  locks' . 
They  are  compatible  with  all  other  locks  (see  Fig  1.).  These  are  used  by  transactions 
to  keep  track  of  data  items  read  or  written  by  other  transactions  and  cause  trigger- 
ing as  described  later. 

The  Green  lock  is  a  shared  lock  used  for  reading  the  readonly  part  of  the  read- 
set. 


The  reason  for  calling  the  Blue  and  White  locks  marker  locks  and  aot  just  markers  are  as  fol- 
lows. .Markers  are  used  by  agents  to  mark  objects.  Irrespective  of  how  many  times  an  object  is 
marked,  it  becomes  unmarked  when  the  marker  is  removed.  With  a  lock  we  can  ask  questions 
such  as  Which  agents  hold  locks  on  this  object?  or  What  are  the  objects  locked  by  this  agent? 
Conceptually  markers  are  too  weak  to  answer  both  these  questions. 
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Fig  1 :  Lock  Compatibility  Table  (Note:  The  table  is  asymmetric). 

The  Yellow  lock  is  also  a  shared  lock  but  is  stronger  than  the  Green  lock.  It  is 
used  to  lock  the  items  in  the  wnteset,  as  a  preparatory  measure,  before  they  are 
actually  updated.  The  Yellow  lock  is  compatible  with  the  Green  locks,  allowing  a 
Yellow  locked  item  to  be  read  as  a  readonly  data  item  by  another  transaction,  but  it 
is  not  compatible  with  itself,  preventing  simultaneous  update  attempts.  This  basi- 
cally prevent  a  deadlock  condition,  as  two  transactions,  each  trying  to  upgrade  a 
shared  lock  on  the  same  item  to  an  exclusive  lock,  cause  the  simplest  deadlock  situa- 
tion. We  could  allow  the  Yellow  lock  to  be  compatible  with  itself  (making  it  a  true 
shared  lock)  but  this  would  serve  no  purpose  (except  increase  chances  of 
deadlocks). 

Note  that  a  Green  lock  can  be  obtained  on  a  Yellow  locked  item  but  a  Yellow 
lock  cannot  be  obtained  on  a  Green  locked  item.  This  feature  makes  the  compatibil- 
ity matrix  asymmetric.  This  asymmetry  is  not  a  major  feature  of  the  algorithm  but 
has  to  be  provided  to  prevent  a  particular  race  condition  that  can  arise  in  the  lock 
acquisition  phase,  described  later. 

The  Red  lock  is  the  exclusive  lock  used  for  writing,  and  is  compatible  only  with 
the  White  and  Blue  marker  locks.  Neither  the  Green  locks  nor  the  Red  locks  are  held 
over  extended  lengths  of  time.  Only  Yellow,  Blue  and  White  locks  exist  nearly  as 
long  as  the  transaction  does. 


2.3   Transaction  Phases 
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A  transaction  T  goes  through  several  phases.  When  the  transaction  is  initiated, 
(arrival  point) ,  the  transaction  manager  obtains  Yellow  locks  on  the  data  items  in 
Wr(T)  and  Green  locks  on  Rd(T)-Wr(T)  i.e.  the  read-only  data  items.  This  is  the 
lock  acquisition  phase. 

After  all  the  locks  are  obtained,  the  transaction  is  validated.  If  the  transaction 
passes  validation  then  it  has  to  acquire  some  Blue  and  White  locks.  It  gets  Blue  (and 
White)  locks  on  data  items  written  (and  read)  by  some  other  concurrently  running 
transactions.  It  also  has  to  donate  White  and  Blue  locks  on  the  items  in  its  read  and 
write  sets  to  some  other  transactions.  This  is  called  lock  inheritance  phase,  and  the 
exact  details  of  which  data  items  are  locked  are  explained  later.  After  completion  of 
the  lock  acquisition  phase,  the  transaction  reaches  its  locked  point. 

Subsequently  all  the  items  in  Rd(T)  are  read  into  local  storage,  the  Green  locks 
are  converted  (downgraded)  to  White  locks  and  the  transaction  enters  the  processing 
phase.  Then  the  transaction  commences  execution  (start  point).  When  the  transac- 
tion completes  execution  it  commits  (commit  point),  and  all  Yellow  locks  are  con- 
verted to  Red  locks,  the  updated  items  are  written  to  the  database,  all  locks  are 
released,  and  the  transaction  terminates.  These  phases  and  points  are  illustrated  in 
Fig.  2. 

The  following  is  an  informal  outline  of  how  a  transaction  manager  handles  a 
transaction. 

-  Arrival  Point  (Transaction  T  arrives) 

•  Get  Yellow  locks  on  '•Vr(T)  and  compute  Before/After  sets  (explained  later) 

•  Get  Green  locks  on  Rd(T)  -  Wr(T) 

•  Do  validation  and  lock  inheritance  processing  (explained  later) 

-  Locked  Point 

•  Read  values  of  Rd(T)  into  local  storage, 

•  Downgrade  Green  locks  to  White  locks, 

•  Start  transaction  processing. 

-  Start  Point 

•  Let  T  commence  processing, 

if  T  issues  read(x),  then  return  the  value  of  x  from  local  storage, 
if  T  issues  write(x),  then  write  x  in  local  storage. 
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-  Commit  Point 

•  Upgrade  Yellow  locks  to  Red  locks, 

•  Write  updated  items  to  the  database, 

•  Release  all  (White,  Blue,  and  Red)  locks  held  by  T. 

-  Termination  Point  (Transaction  T  terminates) 

2.4  Transaction  Manager  Algorithms. 

A  transaction  is  said  to  be  live  if  it  has  arrived,  but  has  not  terminated.  For 
each  live  transaction  T.  the  Transaction  Manager  maintains  two  temporary  sets,  dur- 
ing lock  acquisition  phase,  called  Before(T)  and  After(T).  These  sets  are  accessed 
and  updated  only  during  the  lock  acquisition  phase.  The  set  Before(T)  is  a  set  of 
transaction  that  are  live,  conflict  with  T.  and  must  come  before  T  in  a  serialization 
order.  Similarly  After(T)  contains  those  live  and  conflicting  transactions  that  must 
come  after  T  in  the  serialization  order.  The  serialization  order  is  determined  by  the 
actual  order  in  which  the  locks  are  requested  by  the  concurrent  transactions.  (Some- 
times the  Before  and  After  sets  contain  a  few  recently  terminated  transactions,  but 
that  is  of  no  major  consequence.) 

As  the  transaction  manager  acquires  locks  on  behalf  of  a  transaction,  it  can 
determine  which  transactions  must  come  before  or  after  this  transaction  in  the  seri- 
alization order,  by  looking  at  the  e.xisting  locks.  These  transactions  are  placed  in  the 
Before(T)  and  After(T)  sets.  The  Before(T)  and  After(T)  sets  are  constructed  as 
follows. 

Suppose  T  wants  a  Green  lock  on  a  data  item  x,  and  a  set  of  transactions  {T-, 
T-,  ....  }  already  possess  Blue  locks  on  x.  The  existence  of  Blue  locks  held  by  {T-,  T- 
....  }  implies  that  some  transaction(s)  later  than  all  of  {T  ,  T-  ...,  }  have  written  x. 
As  T  wants  to  read  x.  it  must  come  after  all  of  {T-,  T-,  ...,  }.  Thus  (T  ,  T-,  ...,  } 
must  logically  precede  T,  and  they  are  added  to  Before(T). 

However,  if  some  transaction  T-  is  holding  a  Yellow  lock  on  x  (when  T  is  trying 
to  Green  lock  it),  this  implies  T-  will  update  x  after  T  reads  it.  Hence  T-  should 
come  after  T,  and  T-  is  added  to  After(T) . 

Similarly,  during  an  attempt  to  Yellow  lock  an  item  x,  all  transactions  holding 
Blue  or  White  locks  on  x  are  added  to  Before(T). 

Validation  is  simply  checking  whether  Before(T)  n  After(T)  =  0.  If  not.  this 
implies  that  there  are  transactions  that  must  come  before  as  well  as  after  T  in  the 
serialization  order,  and  thus  the  resulting  execution  would  be  non-serializable.  To 
prevent  this  from  occurring,  the  transaction  T  is  rescheduled.  .Avoiding  livelocks  (or 
starvation)  due  to  rescheduling  is  dealt  with  in  section  6. 


- 11  - 

If  the  transaction  passes  validation,  then  the  lock  inheritance  processing  has  to 
be  done.  Some  Blue  and  White  locks  are  granted  to  various  transactions  as  a  result 
of  lock  inheritance.   This  is  done  as  follows. 

Suppose  T  has  passed  validation.  T  is  delayed  until  all  the  transactions  in 
After(T)  reach  their  respective  locked  points.)  Then  T  is  given  White  locks  on  all 
the  data  items  White  locked  by  transactions  in  After(T),  and  Blue  locks  on  all  the 
data  items  Blue  locked  by  transactions  in  After(T).  Then  all  transactions  in 
Before(T)  are  given  White  locks  on  the  readset  of  T,  Blue  locks  on  the  writeset  of  T. 
Finally,  all  transactions  in  Before(T)  get  White  locks  on  all  data  items  White  locked 
by  T,  and  Blue  locks  on  all  data  items  Blue  locked  by  T.  Note  that  during  this  last 
two  steps  of  the  lock  acquisition  phase  of  transaction  T,  some  transactions  other 
than  T  get  some  Blue  and  White  locks.  (These  other  transactions  are  the  transactions 
in  Before(T).)  After  lock  inheritance,  the  transaction  actually  starts  executing, 
entering  its  processing  phase. 

The  algorithms  stated  above  are  formally  restated  in  pseudo  Pascal.  Some  of 
the  abbreviations  used  are: 

WL  -  White  Lock 

WL-ed  -  White  Locked 

WLS(T)  -  Set  of  data  items  White  locked  by  T 

LOCKiWhite ,x)  -  Obtain  a  White  lock  on  data  item  x.  If  lock 
unavailable  due  to  a  conflict,  wait  until  it  can  be  obtained. 

(similar,  for  all  other  colors) 


i)    Getting  Yellow  Locks: 

Before(T)  -  0; 
for  all  X  ^  Wr(T)  do 
begin 

LOCK  (Yellow,  x); 

Before(T)  -  Before(T)  U  {  T-  j  x  is  WL-ed  or  BL-ed  by  T} 


end; 
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ii)  Getting  Green  Locks: 

After(T)  -0; 

for  all  X  i  (Rd(T)-Wr(T))  do 
begin 

LOCK  (Green,  x); 

Before(T)  -  Before(T)  U  {  T-  i  x  is  BL-ed  by  T^}; 
After(T)  -  After(T)  U  {T-  |  x  is  YL-ed  by  T^} 
end; 
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iii)  Validation  and  lock  inheritance; 

if  (After(T)  fl  Before(T)  i-  0) 

(*  Validation  *) 
then   RESCHEDULE  T 
else 
begin 
for  all  T  €  After(T)  do 
begin 
Wait  for  t  to  reach  locked  point; 
BLS(T)  -BLS(T)    U    BLS(t)    U    Wr(T); 
(•    T  gets  Blue  locks  on  the  writescc  of  r 
and  all  items  Blue  locked  by  r    ") 
WLS(T)  -  WLS(T)    U    WLS(T)    U    Rd(T) 
(■    T  gets  White  locks  on  the  readset  of  t 
and  all  items  While  locked  by  t    *) 

end; 

for  all  T  ^  Before(T)  do 
begin 

BLS(t)  -  BLS(T)    U    BLS(T)    U    Wr(T); 

(*  T  gets  Blue  locks  on  all  items 

Blue  locked  by  T  and  the  writeset  of  T  *) 
WLS(t)  -  WLS(t)    U    WLS(T)    U    Rd(T) 

(*  T  gets  White  locks  on  ail  items 

White  locked  by  T  and  the  readset  of  T  ') 
end 
end; 


We  stress  that  there  is  no  assumption  of  atomicity  of  any  part  of  the  above 
Transaction  Manager  algorithms,  except  in  the  test  and  set  needed  for  the  imple- 
mentation of  the  LOCK  function.  The  LOCK  function  is  the  standard  locking  primi- 
tive. If  the  lock  requested  cannot  be  granted  due  to  a  conflict,  then  fhe  process  is 
suspended  (or  queued)  until  the  lock  can  be  granted.  These  algorithms  can  be  exe- 
cuted concurrently  with  all  the  activities  of  the  other  transaction  on  the  database  sys- 
tem, including  lock  acquisition  phases  of  other  transactions. 

In  our  protocol,  locks  can  be  held  only  by  live  (or  uncommitted)  transactions. 
Hence  if  a  transaction  t  in  Before(T)  commits  before  T  gets  to  do  lock  inheritance. 


-   14  - 

then  T  does  not  have  to  be  given  any  locks. 

2.5  Intuitive  Discussion. 

The  transaction  manager  of  the  Five  Color  protocol  handles  all  the  locking  and 
concurrency  control  needed  to  run  transactions.  The  transactions  themselves  do  not 
have  any  knowledge  of  the  protocols.  A  transaction  has  the  following  structure: 

Begin-Transaction  (  Readset,  Wnteset) 

{  Statements  } 

End-Transaction. 

The  Begin-Transaction  statement  starts  up  the  lock  acquisition  phase  of  the 
transaction.  The  transaction  manager  does  the  acquiring  of  locks  (using  the  readset 
and  writeset  information  provided  as  parameters)  and  then  performs  the  validation 
and  inheritance  phases.  After  all  the  preprocessing  is  completed,  the  transaction 
starts  executing  the  statements,  needing  no  further  assistance  from  the  transaction 
manager.  When  the  End-Transaction  statement  is  reached,  the  transaction  manager 
is  called  upon  to  do  the  Yellow  to  Red  lock  upgrades,  actual  writes  and  commit. 

Though  acquiring  locks  are  done  on  behalf  of  the  transaction,  by  the  transac- 
tion manager,  in  the  rest  of  our  discussions  we  will  refer  to  this  event  as  "'a  transac- 
tion obtains  a  lock"  because  of  simplicity  and  conceptual  clarity. 

As  described  in  section  2.1,  the  basic  algorithm  that  the  Five  Color  Protocol 
uses  is  as  follows.  First,  the  writeset  is  locked  using  the  Yellow  lock.  The  Yellow  lock 
is  the  shared  lock  that  allows  reading,  but  is  not  compatible  with  itself.  This  allows 
reading  of  to-be-updated  items,  but  does  not  allow  missing  updates  or  simple 
deadlocks  (section  2.2). 

Then  the  readset  is  locked  with  Green  locks.  The  Green  lock  is  a  shared  lock 
(or  read  lock).  After  the  locking  of  the  readset,  all  the  items  in  the  readset  is  read 
into  local  storage,  and  is  available  to  the  transaction  when  it  needs  to  actually  read 
them.  Then  the  Green  locks  are  downgraded  to  White  locks. 

After  the  locks  are  obtained,  the  validation  and  lock  inheritance  processing  is 
done,  the  transaction  commences  execution  and  finally  commits.  In  the  commit 
phase,  the  Yellow  locks  are  upgraded  to  Red  locks,  the  data  written  out,  and  all  locks 
released.  The  need  for  validation,  and  how  it  is  done  is  discussed  later  in  this  sec- 
tion. 
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The  properties  that  allow  more  concurrency  in  the  Five  Color  protocol  are  the 
early  release  of  all  read  locks  {Green  locks),  and  the  shared  nature  of  the  Yellow 
lock. 

Suppose  Transaction  T,  is  running,  and  it  has  released  all  the  Green  locks  on 
the  readset.  Now  T,  can  update  an  item  which  was  read  by  T,,  making  T-,  a  logi- 
cally later  transaction  in  the  serialization  order.  In  2-phase  locking  T-,  has  to  wait 
until  T,  releases  the  read  lock,  and  this  may  mean  waiting  until  T,  commits.  If  there 
are  many  reading  transaction  on  database  systems,  transaction  that  write  can  be  held 
up  for  long  periods  of  time. 

In  addition,  write  locks  held  by  transactions,  updating  data  items,  cause  delays 
for  reading  transactions.  The  Yellow  locking  scheme  allows  readonly  transactions  to 
read  data  items  even  in  the  presence  of  update  transactions. 

The  early  release  of  read  locks  may  allow  non-serializable  executions,  and  this 
has  to  be  avoided.  Validation  is  used  to  avoid  possible  inconsistencies.  The  follow- 
ing example  shows  the  need  for  validation. 

Suppose  T,  is  running  and  has  read  x  and  released  the  Green  lock  on  x.  T,  has 
a  Yellow  lock  on  y  which  it  intends  to  update  later.  Now  a  new  transaction  T2  can 
choose  to  update  x  (get  a  Yellow  on  x),  making  T-,  a  logically  later  transaction  than 
T..  Tj  can  also  attempt  to  read  (or  get  a  Green  lock)  on  y,  an  item  that  is  Yellow 
locked  by  T,  (making  T,  come  logically  before  T,).  This  would  lead  to  a  non- 
serializable  condition 

This  is  a  simple  case  of  a  possible  non-serializable  execution  that  may  occur.  It 
may  seem  this  is  a  easy  condition  to  detect  without  going  into  the  complexities  of 
validation  and  lock  inheritance  as  described  in  previous  sections.  However,  there 
are  situations  involving  several  transactions  and  several  data  items  that  turn  out  to 
be  too  complex  to  detect,  and  the  validation  scheme  described  is  one  of  the  simplest 
schemes  that  detect  them.  The  validation  works  as  follows. 

A  transaction  T,  holds  White  locks  on  all  items  it  has  read.  T,  also  holds  White 
locks  on  data  items  read  by  other  transactions  that  (  are  active  or  were  active 
during  the  lifetime  of  T,,  and  (ii)  should  come  after  T,  in  the  serialization  order. 
Similarly,  T,  holds  Blue  locks  on  all  items  written  (or  to  be  written)"  by  transac- 
tions that  were  active  in  T.'s  lifetime  and  should  come  after  T,  in  the  serialization 
order.  (This  property  is  proved  later  in  Lemma  4).  This  implies  T,  "knows"  about 
all  the  data  items  read  or  written  by  '"later"  transactions. 


A  transaction  is  active  if  it  has  reached  locked  point,  but  has  not  started  acquiring  Red  locks. 

"  The  fact  that  T,  holds  Blue  locks  on  data  items  that  have  not  yet  been  updated  can  lead  to  an 
anomaly  m  certain  simations.  Sec  section  5  for  a  modification  that  avoids  this. 
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Suppose  a  new  transaction  T^  arrives  in  this  situation,  and  Yellow  locks  an  item 
that  is  Blue  or  White  locked  by  Tj.  This  implies  T-,  will  update  an  item  that  has  been 
read  or  written  by  some  transaction  that  is  after  T^  in  the  serialization  order.  Thus 
T,  should  be  after  T.  in  the  serialization  order.  As  expected,  the  Yellow  locking  of 
this  item  causes  T,  to  be  placed  in  Before(T).  before  T,  in  the  serialization  order. 
In  fact  all  transactions  such  as  T^  which  should  come  before  T2  get  into  Before(T2). 

Now  T,  can  cause  a  non-serializable  condition  by  attempting  to  Green  lock  an 
item,  which  is  already  Yellow  locked  by  a  transaction  such  as  T^.  Green  locking  a 
Yellow  locked  item  implies  that  T-,  is  getting  a  view  of  the  database  as  it  was  before 
T,  updates  the  item  it  Yellow  locked,  thus  T-,  should  precede  T^.  But  this  action 
would  cause  T,  to  get  into  After(T-,).  and  T-,  would  fail  validation. 

The  fact  that  each  transaction  T,  has  White  and  Blue  locks  on  the  read  and 
write  sets  of  all  transactions  that  come  after  T,  is  ensured  as  follows.  Whenever  a 
new  transaction,  T^,  determines  all  the  active  transactions  which  follow  it  (by  means 
of  the  AfterCTj)  set),  it  acquires  White  (and  Blue)  locks  on  the  readset  (and  wri- 
teset)  of  all  transaction  in  After(T-,).  Similarly,  it  gives  all  the  transactions  in 
Before(T2)  White  locks  on  all  its  White  locked  objects,  and  Blue  locks  on  all  its  Blue 
locked  objects  and  Wr(T-,). 

The  maintenance  of  the  White  and  Blue  locks  might  seem  contrived  and 
unnecessary,  but  it  has  to  be  done  to  prevent  conditions  typified  by  the  following 
example. 
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^1 

^2 

T3 

Notes 

Arrives, 
Yellow  locks  y 
Green  locks  x 
reads  x 

Converts  Green 
on  x  to  White. 

Gets  Blue  lock 
on  z 

Arrives, 
Yellow  locks  x 
J^e/Zow  locks  z 

Updates  x 
Updates  z 
terminates 

T^  €  Before(T2) 

Arrives 
Yellow  locks  z 
Green  locks  y 

(...interesting 
part,  see  below.) 

If  T.,  were  allowed  to  continue,  then  it  would  read  y  and  write  z,  and  cause  a 
non-senalizable  execution.  Note  that  at  this  point  T-,  is  no  longer  alive,  and  T^  does 
not  touch  z.  So  there  seems  to  be  no  basis  for  disallowing  T^  from  reading  y  and 
writing  z,  unless  if  we  use  the  Blue  and  White  locks,  and  the  Before  and  After  sets. 

When  T.,  Yellow  locked  z,  z  was  Blue  locked  by  T^.  This  caused  T^  to  become 
a  member  of  BeforetT-,).  When  T,  Green  locked  y,  since  y  was  Yellow  locked  by 
Tp  T,  became  a  member  of  Afterd^).  Now  that  the  intersection  of  Befored^)  and 
.After(T-,)  is  not  empty,  T^  fails  validation,  and  the  non-serializable  execution  is  not 
allowed. 


2.6   An  Example 
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Consider  the  following  log: 

Log  Rj(x)    R-,(y)    Wj(y)    R3(z)    W^Cz) 

Serial  order:       T^    -  ^?    "  ^1 

The  log  is  non-2-phase  locked  as  may  be  seen  by  inspection.  This  log  is  allowed 
by  the  Five  Color  protocol.  The  behavior  of  the  protocol  in  response  to  the  above 
log  is  depicted  in  Fig.  3. 

The  above  log  has  an  interesting  property.  Note  that  although  T,  completes 
execution  before  T-,  commences,  T-,  precedes  T,  in  the  serialization  order.  This 
violates  a  property  of  some  serializable  logs  called  strict  serializability'  [BeGoSO]. 
Thus  the  above  log  is  classified  as  a  non-strictly -serializable. 

Most  well  known  locking  protocols  do  not  allow  non-strictly-serializable  logs. 
That  is  the  logs  allowed  by  these  protocols  are  proper  subsets  of  strictly  serializable 
logs.  As  2-phase  locking  does  not  allow  such  logs,  the  above  log  cannot  be  allowed 
by  any  2-phase  locking  scheme.  This  demonstrates  that  the  ability  of  the  Five  Color 
protocol  extends  beyond  the  scope  of  2-phase  locking. 

3  Properties  and  Correctness: 

We  now  state  the  properties  of  the  Five  Color  protocol  and  prove  it  achieves 
serializability.  The  following  Lemmas  are  useful  to  understand  the  details  of  the 
algorithms,  but  the  actual  proofs  may  be  skipped  at  the  first  reading.  In  order  to 
show  that  the  Five  Color  protocol  assures  serializability,  we  define  a  standard  pre- 
cedence relation  -.  This  relation  is  created  amongst  transactions  as  the  Five  Color 
scheduler  executes  in  a  manner  consistent  with  the  conflicts  that  occur  amongst  the 
transactions.  We  show  that  this  relation,  for  transactions  under  the  protocol  is  cycle 
free.  All  transactions  that  partake  in  the  relations  in  the  definitions  are  assumed  to 
be  transactions  that  have  already  passed  the  validation  phase.  Transactions  that  have 
not  passed  validation  also  generate  relationships  with  other  transactions,  but  since 
these  transactions  do  not  contribute  anything  to  the  processing  environment,  we 
choose  to  ignore  them  in  the  correctness  proofs. 

Definitions: 

Define  a  binary  relation  (-)  over  transactions.  Two  transactions  T-  and  T-  are 
related  by  the  precedence  relation  (T  -T),  if  and  only  if: 


A  log  L  is  strictly  serializable  if  there  exists  a  serial  order  L    of  L  such  that  if  T.,  and  T,  are 
m  L  and  their  action  do  not  interleave,  then  T,  and  T.,  appear  m  the  same  order  in  L  as  in  L. 


i)  T-  reads  some  data  item  x,  and  at  some  later  point  T-  writes  x  (r-w  conflict),  or 
ii)  T-  writes  some  data  item  x,  and  at  some  later  point  T-  reads  x  (w-r  conflict),  or 
iii)    T-  writes  some  data  item  x,  and  at  some  later  point  T-  writes  x  (w-w  conflict). 

Lemma  I 

The  relation  T  -T-  occurs  if  and  only  if  one  of  the  following  cases  take  place: 

i)  T-  gets  a  Green  lock  on  x  and  then  T-  gets  a  Yellow  lock  on  x  after  T-  releases 
the  Green  lock.   This  arc  is  defined  as  of  type  [a^^   \  aG.^Y] .' 

ii)  T.  gets  a  Yellow  lock  on  x,  then  while  T-  holds  the  Yellow  lock,  T-  gets  a  Green 
lock  on  X,  and  then  T-  converts  the  Yellow  lock  into  a  Red.  This  arc  is  defined 
as  of  type /"a-p   |  ^Y.aG.^R]. 

iii)  T-  gets  a  Yellow  lock  on  x,  and  later,  after  T-  unlocks  x,  T-  gets  a  Green  lock  on 
X.  This  arc  is  defined  as  of  type  /"a-p  |  aY.aR.^GJ . 

iv)  T-  Yellow  locks  x,  and  later  T-  Yellow  locks  x.  This  arc  is  defined  as  of  type  by 
[a^^   I  aY.^Y]. 

Proof. 

Simple,  but  lengthy. 

[] 

The  two  cases  ii)  and  iii)  (above)  look  similar  but  cause  very  different  results. 

In  ii),  T-  gets  a  Yellow  lock  on  x  and  then,  while  the  Yellow  lock  is  held,  T-  gets  a 

Green  lock  on  x.  In  this  case,  T    will  update  x,  but  T-  gets  to  see  x  as  it  existed 

before  T   updated  it.  Hence  the  serialization  order  of  the  transactions  should  be  T-  - 

T  .  However  in  case  iii),  under  similar  circumstances,  if  T-  gets  a  Yellow  lock,  and 

J  }  ° 

after  this  Yellow  lock  has  been  converted  to  Red,  and  released,  T-  gets  a  Green  lock 

on  X,  then  the  serialization  order  should  obviously  be  T-  -  T-.  (The  roles  of  T-  and 

T   in  the  second  example  has  been  reversed,  to  be  similar  to  the  first  example.) 

The  arcs  of  the  precedence  relation  graph  (-)  are  caused  by  r-w,  w-r  and  w-w 
conflicts.  These  conflicts  happen  when  both  the  transactions  have  actually  processed 
the  conflicting  reads  and  writes.  However  for  ease  of  modeling  we  will  assume  that 
the  arcs  are  created  earlier.  We  define  that  an  arc  T-  -  T-  is  created  when  either  Tj 

The  notations  such  as  /a -3  |  aC^YJ  denotes  types  of  edges.  If  two  transactions  T,  and  T, 
arc  related  as  T.  -  T^,  this  relation  can  be  caused  in  many  ways,  a  stands  for  the  transacnon 
to  the  left  of  the  -■  symbol,  and  3  is  the  transaction  on  its  right.  R,Y  and  G  denote  getting  a 
Red  Yellow  or  Green  lock.  The  sequence  to  the  left  of  the  '|'  shows  the  sequence  of  getting 
locks  by  a  and  g  which  led  to  the  formation  of  the  -  relation. 


or  T-  reaches  locked  point,  whichever  is  later.  (T-  and  T-  must  of  course  conflict  as 
stated  in  Lemma  1.)  This  occurs  after  all  the  locks  for  T-  and  T-  have  been  obtained, 
but  before  any  actual  conflict  due  to  actual  reading  or  writing  has  taken  place. 
Also,  if  a  transaction  has  not  reached  locked  point,  there  is  no  arc  either  from  or  to 
it. 

Note  that  the  precedence  graph  thus  formed  is  not  identical  to  the  graph 
formed  by  the  r-w,  w-r  and  w-w  conflicts  at  some  given  pomt  in  time.  This  is 
because  the  arcs  of  the  precedence  graph  are  created  prior  to  occurrence  of  the  con- 
flicts. Thus  this  precedence  graph  is  in  fact  a  superset  of  the  conflict  graph,  which 
will  become  identical  to  the  conflict  graph  when  all  activity  on  the  database  ceases. 
However  our  correctness  criterion  is  the  acyclicity  of  the  conflict  graph  (see  below) 
and  acyclicity  of  the  precedence  graph  implies  acyclicity  of  the  conflict  graph. 

Lemma  2 

If  T-T-  and  T-  reached  its  locked  point  before  T  did,  then  the  arc  can  only  be 
of  type  /a-3  |  ^Y.aG.^RJ. 


Proof. 

S 
could  have  been  caused  by  four  cases  (Lemma  1).  Consider  each  case  separately: 


Since  T-  has  reached  its  locked  point,  it  has  obtained  all  its  locks.    The  arc  T--T- 


After  T-  gets  a  Green  lock  on  ,x.  T  cannot  get  a  Yellow  lock  on  x  until  T  con- 
verts the  Green  lock  to  a  White  lock.  Hence  T  cannot  reach  the  locked  point 
before  T-  does  and  this  case  is  impossible. 

/•a-3   I  ^Y.aGMJ 

This  case  is  possible,  as  T-  may  obtain  a  Green  lock  on  x  after  T-  has  placed  a 
Yellow  lock  on  x. 


/a^(3  I  (xY.aR.^G] 

In  this  case  T-  has  to  get  a  Red  lock  on  x  before  T-  gets  a  Green  lock  on  x.  This 
makes  it  impossible  for  T-  to  get  to  the  locked  point  before  T  ,  thus  this  case  is 
impossible. 


fa-P   I   aY.^Y] 

Again,  T-  has  to  release  the  Yellow  lock  on  x  before  T-  can  place  another  Yel- 
low lock  on  X.  Hence  T    cannot  reach  its  locked  point  before  T  ,  making  this 
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case  impossible  too. 
Thus    only    the    second    case    can    take    place,    and    hence    the    arc    is    of    type 
/"ct-p   I  py.aG.p/?/. 

[] 

Note,  that  as  defined  earlier,  a  transaction  is  active  if  it  has  reached  locked 
point,  but  has  not  started  acquiring  Red  locks. 

Lemma  3 

If  T.-T-  and  T-  reached  its  locked  point  before  T-  did.  then  T-  is  still  active 
when  T   reaches  its  locked  point. 

Proof 

The  arc  T-T-  is  of  the  type  /"a-p  |  ^Y.aG.^R]  (Lemma  2).  Thus  T-  gets  a  Yel- 
low lock  first,  then  T  gets  a  Green  lock  and  finally  T-  upgrades  its  Yellow  lock  to 
Red.  After  T-  gets  a  Green  lock  on  x,  T-  cannot  convert  its  Yellow  lock  to  a  Red  lock 
until  T-  converts  its  Green  lock  to  a  White  lock.  Thus  when  T-  have  reached  locked 
point,  T-  must  be  active. 

^  [] 

Lemmas  2  and  3  show  that  unlike  2-phase  locking  the  precedence  graph  can 
grow  "backwards"  (see  section  5).  A  transaction  T^,  which  arrives  later  than  tran- 
saction T,,  may  precede  T,  in  the  precedence  order,  if  T,  is  active  when  T2  arrives. 

Lemma  4 

If  T,  -  T2  -  ■••  -T    is  a  path  in  the  precedence  relation,  and  T,  is  active,  then 

i)     T^  possesses  White  locks  on  Rd(Tj^)  (i.e.  Rd(T^)  C  WLS(Tp),  and 
ii)    T^  possesses  Blue  locks  on  \Vr(T^)  (i.e.  Wr(T^)  C  BLS(Tp). 

Proof. 

i)    Proof  is  by  induction  on  n,  the  number  of  transactions  in  the  path. 


In    =    2 


Let  T,-T^,  where  T,  is  active.  The  arc  in  the  path  can  be  of  four  types.  Con- 
sider each  case  separately: 

T-y  can  obtain  the  Yellow  lock  on  the  data  item  (say  x)  after  T,  has  converted 


-  22- 

the  Green  lock  it  was  holding  to  a  White  lock.  When  T^  gets  a  Yellow  lock  on  x 
when  T,  holds  a  White  lock  on  it,  T^  is  added  to  Before 
White  locks  on  T2's  readset.  Hence  Rd(T2)  C  WLSd^; 


when  T,  holds  a  White  lock  on  it,  T-,  is  added  to  Before(T-,j.   Then  T^  inherits 


/■a-P   I  (JK.aG.p/?/ 

When  T,  gets  a  Green  lock  on  x  while  T2  is  holding  a  Yellow  lock  on  x,  T-, 
gets  into  the  set  After(T,).  Then  Tj  inherits  White  locks  on  the  readset  of  T2. 
Hence  Rd(T2)  C  WLS(T^). 

/■a-3   I  aY.aR.^G] 

T,  cannot  be  active  in  this  case. 


T,  cannot 
Thus  T,  has  White  locks  on  the  readset  of  T^. 


T,  cannot  be  active  in  this  case. 


ii)  Similar.  Substitute  Blue  for  White  and  writeset  for  readset  in  above  to  show  that 
T,  has  Blue  locks  on  the  writeset  of  T-,. 


In    >    21 

Assume  i)  and  ii)  are  true  for  all  paths  consisting  of  up  to  n-1  transactions.  Consider 
a  path  consisting  of  n  (n>2)  transactions.  By  definition  all  the  transactions  in  this 
path  have  reached  their  locked  points.  Let  Tj^  (l<k<n)  be  the  transaction  that 
reached  its  locked  point  last,  amongst  all  the  transactions  in  the  path: 

T^  ^T2  -  ...  -Tj^_^     ^T^  -  T^^j  ^'''k  +  2  "  ■•  ^^n 
Consider  the  instance  when  T,  just  reached  its  locked  point.  At  this  point  the  arcs: 

■^k-l-'^k     ^"'^ 

T'k-Tk+l 
are  created.   Thus  prior  to  this  there  were  two  shorter  paths, 

a)   Tj  -  ...  -  T^  J  and 

For  path  a),  T.  is  alive  (by  assumption)  and  thus  T,  has  White  locks  on  Rd(T|^_j) 
and  Blue  locks  on  Wr(T,    ,).  As  T,    reached  locked  point  later  than  Tj^^^,  it  follows 
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that  T,    ,  ,   was  active  when  T,    reached  locked  point,  and  thus  had  White  locks  on 
Rd(Tj^),  and  Blue  locks  on  Wr(T^)  (because  of  path  b). 

By  Lemma  2,  the  arc  T,-T,^  ,  must  be  of  type /"a-p  |  ^Y.olG.^R].    Hence  T, 
gets  a  Green  lock  on  some   x,  that   is   Yellow  locked  by  T,  _|_,.   Thus  T,  _,.,    is  in 
After(T,  ),  and  since  T,    gets  White  locks  on  all  items  White  locked  by  T,  ^.,  T, 
gets  White  locks  on  Rd(T  ). 

The  T.    i-'T,   arc  can  be  of  four  types.  Treating  them  separately: 


/■a-P   I  aCpy/ 

In  this  case  T,  sets  a  Yellow  lock  on  a  data  item  x,  which  is  in  Rd(T,  _,),  and 
thus  X  is  White  locked  by  T,.  This  makes  T,  €  Before(T,  ),  and  thus  T.  inher- 
its White  locks  on  all  WLS(T,  ),  which  contains  Rd(T  ).  Thus  T.  gets  White 
locks  on  Rd(Tjj). 

/■a-3   I  py.aG.p/?/ 

This  cannot  happen  as  T,  ,  reaches  locked  point  before  T,  .  In  other  words,  if 
T.  gets  a  Yellow  lock,  and  then  T,  ,  gets  a  Green  lock,  T,  becomes  a  member 
of  the  set  After(T,  ,).  Now  T,  ,  has  to  wait  for  T,  to  reach  locked  point, 
before  it  can  reach  locked  point.  (This  is  a  condition  during  lock  inheritance, 
please  see  the  algorithm  description  on  page  12.) 

/■a-3  I  aY.aR.^G] 

In  this  case  T.  sets  a  Green  lock  on  a  data  item  x,  which  is  in  Wi'(Ti,.i).  and 
thus  X  is  Blue  locked  by  T,.  This  causes  T,  i  Before(T,  ),  and  thus  T,  inherits 
White  locks  on  all  WLS(Tj^),  which  contains  Rd(T^).  Thus  T^  gets  White  locks 
on  Rd(Tj^). 

In  this  case  T.  sets  a  Yellow  lock  on  a  data  item  x,  which  is  in  Wr(T,  _p,  and 
thus  X  is  Blue  locked  by  T..  This  makes  Tj  €  Before(Tj^),  and  thus  T^  inherits 
White  locks  on  all  WLS(T^)  which  contains  RdCT^^).  Thus  T^  gets  White  locks 
on  Rd(Tjj). 

Thus  for  all  cases,  T,  has  White  locks  on  readset  of  T„. 
I  n 

ii)  Similar.  Substitute  Blue  for  White  and  writeset  for  readset  in  above  to  show 

that  T,  has  Blue  locks  on  the  writeset  of  T^. 
1  n 

[] 

Lemma  4  shows  the  most  important  property  of  the  Five  Color  protocol.  This 
implies  that  if  a  transaction  T,  is  active,  it  ''knows"  about  the  read  and  write  set  of 
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all  transactions  that  come  after  T,.  This  property  is  used  to  achieve  serializability  by 
causing  a  validation  conflict  when  a  cycle  is  created  by  some  transaction. 

Theorem  I: 

The  protocol  ensures  serializability. 

Proof 

We  show  that  the  precedence  graph  is  acyclic.  The  proof  is  by  contradiction. 
Assume  there  can  be  a  cycle  in  the  -  relation.  Choose  a  minimal  cycle: 

Tl-T.- -Tk-l-Tk-'^l 

Assume  that  T,  is  the  transaction  to  reach  its  locked  point  last,  compared  to  all  the 
other  transactions  participating  in  the  cycle.  It  will  be  shown  that  if  T.  accessed  data 
items  in  a  manner  that  caused  the  cycle  in  the  -  relation,  then  T,  would  have  been 
rescheduled  at  the  validation  phase. 

T,   causes  the  creation  of  two  arcs,  which  cause  the  cycle.  They  are  the  arcs: 
Al:    T^.pT^and 
A2:   T,^T^. 

As  T,  is  the  last  transaction  to  reach  its  locked  point,  T.  must  have  reached  its 
locked  point  before  T,  ,  hence  (by  Lemma  2)  the  arc  <T, -T.>  must  be  of  type 
/"a-p  I  py.aG.p/?/.  Thus  T,  gets  a  Green  lock  on  a  data  item  Yellow  locked  by  T.. 
Hence  T^  i  After(Tj^) 

.Also,  by  Lemma  3,  T,  must  have  been  active  when  T,  reached  its  locked  point. 
Thus  by  Lemma  4,  T,  has  White  locks  on  Rd(T,    ,)  and  Blue  locks  on  Wr(T,    ,). 

The  arc  <T,    i-'Tu>  could  be  of  four  types.  Let  us  treat  them  separately: 

/•a-p  I  aC.^Y] 

In  this  case  T,  €  Before(T,  )  (see  proof  of  Lemma  4).  Thus  T,  ^  (Before(T.  ) 
n  After(Tu)),  and  hence  T,  should  have  been  rescheduled  at  validation,  and 
the  cycle  could  not  have  resulted. 

/a^p   I  py.aG.p/?/ 

This  type  of  arc  could  have  been  caused  if  and  only  if  T.  reached  locked  point 
before  T,  _,,  which  is  not  th^  case. 

/"a-3   I  aY.aR.^G] 

In  this  case  T,  ^  Before(T,  )  (see  proof  of  Lemma  4).  Thus  again  T,  ^ 
(After(T.  )  fl  Before(T,  ))  and  T,    should  have  been  rescheduled. 

/a-(3   I  aY.^Y] 

In  this  case,  again  T,  €  Before(T).  Rest  as  above. 
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Thus  there  can  be  no  cycle  in  the  -  relation  under  the  protocol,  and  hence  the  proto- 
col ensures  consistency  of  updates. 

[] 

4  A  Modification 

The  algorithm  as  described  has  an  undesirable  feature.  It  does  not  hamper  con- 
sistency but  may  cause  more  aborts  than  necessary.  Consider  the  following  situa- 
tion: 

1)  Tj  gets  Green  lock  on  x 

2)  T,  downgrades  Green  lock  to  White  lock 

3)  T2  gets  Yellow  lock  on  x 

4)  T^  gets  Green  lock  on  x 

At  step  3,  T|  IS  in  Before(T2)  and  thus  gets  a  Blue  lock  on  x.  When  T^  gets  a  Green 
lock  on  X,  Before(T-,)  contains  T,.  Actually  T,  and  T^  are  unrelated.  However  as 
T,  believes  T.,  wrote  x  after  T,  read  it  thus  any  transaction  reading  x  should  come 
after  T,.  The  anomaly  exists  as  T^  has  not  yet  written  x  when  T^  reads  x.  There 
would  have  been  no  problem  if  T.,  had  terminated  before  T-,  read  x,  and  in  this  case 
T,  would  have  to  be  after  T,. 

Thus  in  a  path  of  transaction  T,  -  T2 ^  ^k'  ^1  should  have  Blue  locks  only 

on  those  data  items  that  have  been  updated  by  T^,  •••,  T.  ,  and  not  on  all  data  items 
in  the  writeset  of  T^,    ••.  T.  . 

The  following  is  a  very  brief  description  of  a  method  to  prevent  the  above  ano- 
maly. Introduce  another  type  of  lock,  called  a  l-Blue  lock  (or  Inieni-Blue  lock).  Dur- 
ing lock  inheritance.  Blue  locks  are  obtained  on  items  already  Blue  locked  by  other 
transaction,  while  l-Blue  locks  are  obtained  on  the  writesets  (uncommitted  updates) 
of  the  transaaions  concerned.  When  a  transaction  commits,  all  the  l-Blue  locks  held 
on  its  writeset  by  other  transactions  are  changed  to  Blue  locks.  All  other  algorithms 
remain  the  same. 

This  modification  is  not  incorporated  in  the  algorithm  as  described  in  section  2. 
This  is  to  avoid  introduction  of  extra  complexity,  which  has  no  bearing  on  the 
correctness  of  the  protocol,  and  simplify  understanding  of  the  protocol.  This  is  not  a 
correctness  issue,  nor  an  important  point  in  the  concepts  used  in  the  Five  Color  pro- 
tocol. 

5   Deadlocks 

In  the  Five  Color  protocol  there  is  potential  for  two  types  of  deadlocks,  one  due 
to  locking  and  the  other  due  to  waiting  tor  other  processes  to  reach  their  locked 
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points.  Let  us  address  them  separately. 

The  first  form  of  deadlock  is  the  one  encountered  in  traditional  locking  proto- 
cols, and  is  caused  by  transactions  in  a  circular  wait,  trying  to  obtain  locks.  This 
form  of  deadlock  can  be  prevented  in  this  protocol.  Since  we  know  beforehand  the 
resources  (locks)  that  the  transaction  needs  to  access.  We  define  an  ordering  of 
resources  and  locks  are  obtained  in  an  increasing  order.  First  all  the  Yellow  locks 
are  obtained,  in  increasing  order,  and  then  all  the  Green  locks  are  obtained  in  that 
order.  The  Yellow  to  Red  conversion  is  also  done  in  the  increasing  order  of 
resources  (data  items). 

Theorem  _ 

The  pre-ordered  locking  strategy  used  in  the  Five  Color  protocol  is  deadlock 
free. 

Proof 

Suppose  deadlocks  can  take  place.  Consider  an  instance  of  a  deadlock  involving 
k  transactions,  with  the  cycle  in  the  waits  for  graph  being: 

Ti-T2^...^T,-Tj 

There  are  two  cases. 

1) 

Suppose  none  of  the  transactions  T,  to  T,  are  in  the  process  of  acquiring  Red 
locks,  that  is  they  are  in  the  Green  or  Yellow  locking  phase.  Now  T,  to  T,  cannot 
be  all  in  the  Green  lock  or  Yellow  lock  acquiring  phase.  (If  there  is  one  type  of  lock- 
ing, this  strategy  is  known  to  be  deadlock  free).  Thus  some  transactions  are  waiting 
for  a  Green  lock,  and  others  are  waiting  for  a  Yellow  lock.  Hence  at  some  point,  a 
transaction  T-  is  waiting  for  a  Green  lock  on  a  data  item  x,  for  which  T-  is  holding  a 
Yellow  lock.  This  is  a  contradiction  as  the  Green  lock  is  compatible  with  an  e.xisting 
Yellow  lock. 


ii) 


If  we  use  only  one  rype  of  (exclusive)  lock,  this  strategy  of  obtaining  locks  in  a  predefined 
fashion  is  known  to  be  deadlock  free.  However  it  is  not  true  in  general,  where  several  types  of 
locks  with  various  companbilities  arc  used.  In  our  case,  however,  it  is  deadlock  free,  as 
shown  in  theorem  2. 


-27  - 

Suppose  at  least  one  of  the  transactions,  say  T.,  is  in  the  process  of  upgrading 
its  Yellow  locks  to  Red  locks.  A  transaction  which  is  trying  to  upgrade  a  Yellow  lock 
on  X,  to  a  Red  lock  will  have  to  wait  only  if  some  other  transaction  holds  a  Green 
lock  on  X  (as  no  other  transaction  can  hold  a  Yellow  lock  on  x).  Thus  if  T,  is  waiting 
for  T2.  then  Tj  must  be  in  its  Green  lock  acquiring  phase. 

Similarly,  a  transaction  trying  to  acquire  a  Green  lock  on  x  will  wait  for  another 
transaction  only  if  that  transaction  holds  a  Red  lock  on  x,  as  Red  is  the  only  lock 
incompatible  with  the  Green  lock. 

Thus  if  Tj  -  T2  -  ...  -  T,  -  T,  is  a  cycle  of  waiting  transactions  and  T,  is  in 
the  Red  lock  acquiring  phase,  then  so  is  T,,  T-  ...,  and  T-,,  T.  ...,  are  in  the  Green 
lock  acquiring  phase  (that  is,  the  cycle  comprises  only  of  transactions  in  the  Green 
and  Red  lock  acquiring  phases. 

The  proof  that  there  cannot  be  a  cycle  in  a  set  of  transactions  having  the  above 
properties,  is  very  similar  to  the  proof  that  there  cannot  be  a  cycle  in  a  set  of  tran- 
sactions acquiring  exclusive  locks  in  a  predefined  order,  and  is  not  included  here  for 
brevity. 

[] 

The  second  form  of  deadlock  is  peculiar  to  this  protocol.  This  form  of  deadlock 
involve  one  or  more  transactions  in  the  lock  inheritance  phase.  Note  that  in  the  lock 
inheritance  phase,  a  transaction  T  has  to  wait  for  all  transactions  in  After(T)  to 
reach  locked  point.  This  can  cause  deadlocks.  The  following  is  an  example  of  a 
deadlock  caused  by  two  transactions  in  the  lock  inheritance  phase. 

i)  T.  Yellow  locks  x 

ii)  T,  Yellow  locks  y 

iii)  T,  Green  locks  y 

iv)  T^  Green  locks  x 

In  this  sequence  of  events  the  following  problem  takes  place.  Steps  i)  and  ii) 
cause  no  surprises,  but  in  Step  iii)  as  T,  tries  to  Green  lock  y,  which  is  Yellow 
locked  by  T2,  T^  becomes  a  member  of  After(T,).  Similarly,  when  T^  Green  locks 
X,  T,  becomes  a  member  of  After(T2)- 

Now  both  transactions  will  pass  the  validation,  and  wait  for  all  transactions  in 
their  After  sets  to  reach  locked  point.  T,  will  thus  wait  for  T2  to  reach  locked  point 
before  it  can  reach  locked  point,  and  vice  versa.  Thus  we  have  a  deadlock.  This 
deadlock  effectively  prevents  the  non-serializable  log  that  could  result  if  the  transac- 
tions were  allowed  to  continue. 
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The  deadlock  can  involve  transactions  waiting  for  locks  as  well  as  transactions 
waiting  for  other  transactions  to  reach  locked  point.  However  as  proved  above  there 
are  no  deadlocks  involving  only  transactions  waiting  for  locks. 

The  deadlocks  can  be  detected  by  standard  deadlock  detection  algorithms,  or 
by  timeouts.  As  the  deadlock  occurs  before  the  transactions  start  execution,  there  is 
no  rollback  involved  and  the  overhead  suffered  is  small.  Also  the  probability  of 
deadlocks  seems  to  be  lower  than  2-phase  locking.  The  reasons  being  absence  of 
deadlocks  due  to  locking  and  lower  population  of  transactions  that  can  participate  in 
deadlocks.  (Only  transactions  in  the  pre-processing  phase  can  participate  in 
deadlocks.) 

Since  the  transactions  which  participate  in  deadlocks  do  not  do  any  processing, 
we  believe  timeouts  may  be  an  easier  and  more  efficient  method  to  deal  with 
deadlocks.  Lack  of  processing  means  there  are  no  processing  delays  and  the 
timeouts  can  be  fine  tuned  better  to  take  into  account  the  locking  delays  and  cause 
deadlock  warnings  if  something  takes  too  long  to  happen. 

Every  deadlock  cycle  has  at  least  one  transaction  waiting  for  the  completion  of 
the  lock  acquisition  phase  of  another  transaction.  We  propose  that  this  is  the  point 
where  a  timeout  should  be  introduced.  The  delay  for  the  timeout  can  be  a  function 
of  the  number  of  locks  the  other  transaction  has  to  acquire.  This  will  keep  the  pro- 
bability of  detection  of  false  deadlocks  to  be  low. 

6  Livelocks 

There  is  a  small  chance  of  livelocks  or  starvation  in  this  protocol.  This  is  due  to 
the  rescheduling  of  transactions  because  of  validation  failure.  There  is  no  guarantee 
that  an  aborted  transaction  will  finally  be  able  to  run.  However  we  feel  that  the 
chances  of  starvation  is  extremely  small.  But  low  probability  of  starvation  is  still  not 
a  guarantee  against  starvation,  so  we  propose  the  following  method  of  avoiding 
livelocks. 

If  a  transaction  get  aborted  due  to  validation  failure  a  large  number  of  times, 
we  label  the  transaction  as  starvation  prone.  The  writeset  of  a  starvation  prone  tran- 
saction is  then  expanded  to  contain  its  readset.  The  transaction  thus  acquires  only 
Yellow  locks  during  lock  acquisition,  and  as  a  result  has  an  empty  .After  set.  This 
transaction  is  guaranteed  to  pass  validation,  avoiding  the  livelock  problem.  Also  it 
never  waits  for  any  other  transaction  to  complete  lock  inheritance.  It  may  still,  how- 
ever participate  in  deadlocks.  But  note  that  to  detect  the  deadlocks  we  are  timing  out 
and  aborting  a  transaction  that  waits  for  transactions  in  its  After  set.  Since  the  star- 
vation prone  transaction  has  an  empty  .\fter  set  it  will  not  get  aborted  even  if  it  par- 
ticipates in  a  deadlock.  Thus  it  is  guaranteed  to  run  to  completion. 


-  29  - 

7  Discussion. 

The  Five  Color  protocol  differs  significantly  from  the  2-phase  locking  protocol 
in  the  way  the  precedence  (-)  relation  may  grow.  In  the  2-phase  locking  protocol 
the  precedence  arcs  are  created  when  a  transaction  locks  a  data  item.  When  a  tran- 
saction locks  an  item  (or  upgrades  a  lock)  it  places  itself  after  some  other  transac- 
tion, never  before.  Thus  we  say  the  precedence  relation  grows  only  in  xht  forward 
direction.  In  fact  this  is  the  property  of  the  2-phase  locking  protocol  that  ensures 
serializability. 

In  our  protocol,  a  path  under  the  precedence  order  can  grow  in  both  directions. 
Suppose  transaction  T  has  reached  its  locked  point  and  possesses  a  Yellow  lock  on  x. 
A  new  transaction  t  arrives  and  gets  a  Green  lock  on  x.  When  t  reaches  locked 
point,  the  arc  t  -  T  is  created.  Now,  even  after  T  terminates,  as  long  as  t  is  active, 
t'  may  come  and  place  itself  before  t,  and  hence  before  T.  In  this  way,  it  is  possible 
for  future  transactions  to  be  logically  placed  in  the  past. ' 

Thus  the  basic  mechanism  by  which  2-phase  locking  prevents  serializability  is 
not  present  in  our  protocol.  Serializability  is  ensured,  in  this  case  by  the  Blue  and 
White  locks,  and  the  validation  procedure.  Intuitively,  if  T,  ->  ...  -  T2  is  a  chain  of 
transactions,  then  T,  "knows"  about  Rd(T-,)  and  Wr(T2),  because  it  has  White  and 
Blue  locks,  respectively,  on  these  data  items.  If  any  transaction  t  attempts  to  read 
any  data  item  in  Wr(T.,)  (or  write  any  item  in  Rd(T,))  then  due  to  the  "triggering" 
caused  by  Green  {Yellow)  locking  of  a  Blue  {White)  locked  item,  T,  becomes  a 
member  of  Befored)  and  T,  inherits  White  (Blue)  locks  on  Rd(T)  (Wr(T)).  Thus 
information  about  the  read  and  write  sets  flow  up  a  chain  in  the  form  of  "inherited" 
White  and  Blue  locks.  Now  if  t  may  cause  a  cycle  in  the  -  relation  by  attempting  to 
read  an  item  Yellow  locked  by  T,  then  T  would  become  a  member  of  After(T)  and 
violate  the  validation  constraint.  The  other  cases  of  information  flow  when  the  chain 
grows  in  the  reverse  direction,  or  when  two  chain  as  concatenated  by  a  transaction 
is  similar  and  is  covered  in  the  proofs  (Lemma  4). 

7.1   Efficiency  Comparisons  of  Locking  Protocols 


Serting  Green  locks  on  items  wtiich  are  Yellow  locked  may  seem  similar  to  a  multiversion 
scheme  with  two  versions  [BcGo83].  The  transaction  with  the  Yellow  lock  has  created  a  acw 
version,  but  the  transaction  with  a  Green  lock  sees  an  older  version.  However  we  do  not  clas- 
sify this  protocol  as  a  multiversion  protocol  as  the  same  situation  exists  in  2-phase  locking 
schemes,  and  other  schemes  where  actual  writing  is  done  at  commit  time  after  locks  are  up- 
graded. The  before/after  value  phenomenon  is  in  general  not  classified  as  a  true  multiversion 
situation. 
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A  method  of  comparing  the  performance  of  protocols  is  to  compare  the  relative 
sizes  of  the  set  of  logs  they  allow  as  output.  If  the  set  of  logs  generated  by  one  pro- 
tocol is  a  subset  of  the  logs  generated  by  another  protocol,  other,  then  the  latter 
protocol  is  superior.  However  the  set  of  output  logs,  produced  by  the  Five  Color 
protocol  and  the  2-phase  locking  protocols  are  incomparable.  Intuitively  our  proto- 
col should  provide  more  concurrency  for  the  following  reasons  (some  of  them  have 
been  stated  earlier). 

•  Early  release  of  read  locks. 

The  Five  Color  protocol  releases  read  locks  as  soon  as  the  data  is  read.  This 
allows  other  transactions  to  update  these  data  items  without  having  to  wait  for 
the  reading  transaction  to  commit.  (We  assume  for  transparent  2-phase  lock- 
ing, all  locks  are  held  till  commit  point). 

Holding  of  read  locks  for  a  long  time,  as  needed  in  the  2-phase  locking  proto- 
col, has  a  not  very  obvious  problem.  Suppose  T,  and  T-,  have  read  locks  on  x 
and  T-,  requests  an  exclusive  lock.  After  that  T_.  requests  a  read  lock.  Can  the 
request  from  T.  be  satisfied  while  Tt  is  waiting?  The  answer  is  no.  Because  if 
T.  is  allowed  to  read  x,  then  T-,  may  get  starved  by  read  traffic  on  x,  and 
never  get  the  exclusive  lock.  Thus  T_,  has  to  wait  for  T|,T2  and  T,  to  commit 
(although  it  really  could  have  read  x  when  it  asked  for  the  lock.)  This  is  an 
example  where  concurrency  is  drastically  reduced  by  fairness  constraints. 

The  situation  is  different  if  read  locks  guaranteed  to  be  held  for  a  short  time, 
as  in  the  Five  Color  protocol.  In  this  case  we  can  allow  T,  to  read  before  T-, 
gets  its  Yellow  lock  as  chances  of  starvation  of  T,  is  lower  (as  locks  will  clear 
fast).  But  if  we  choose  to  be  fair,  and  make  T,  wait  until  T-,  gets  the  Yellow, 
T.  can  still  get  the  read  lock  as  soon  as  T.  gets  the  Yellow.  .Also  neither  T-,  nor 
T.  has  to  wait  for  T,  and  T,  to  commit,  T,  and  T-,  will  release  the  Green  locks 
as  soon  as  they  complete  reading  x. 

•  Allowing  reading  of  data  items  to  be  written  later. 

This  allows  reading  transactions  faster  access  to  data  items  that  would  have 
been  exclusively  locked  for  quite  some  time  otherwise.  Another  advantage  is 
typified  in  the  above  paragraph. 

•  Non-compatibility  of  Yellow  locks. 

This  property  prevents  some  deadlocks.  Consider  variant  of  the  2-phase  lock- 
ing protocol  that  allow  lock  upgrading.  This  protocol  allows  reading  of  data 
items  that  will  be  updated  later.  But  it  also  allows  two  transaction  to  begin 
updating  the  same  data  item,  i.e.  they  may  both  read  the  data  item  and  then 
request  for  upgrading  to  an  exclusive  lock.  This  causes  what  we  term  trivial 
deadlock.  The  deadlock  is  trivial  to  detect  but  just  as  serious  as  any  deadlock, 
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as  one  of  the  transactions  have  to  be  aborted.  The  incompatibility  of  the  Yellow 
lock,  avoids  chances  of  trivial  deadlocks  and  causes  delay  of  an  update  transac- 
tion while  another  is  updating  the  data  item  (delay  is  better  than  deadlock).  A 
simulation  study  we  conducted  showed  that  trivial  deadlocks  are  the  most  com- 
mon form  of  deadlock  in  the  2-phase  locking  protocol. 

•  Preordering  of  locking  requests. 

This  avoids  deadlocks  due  to  locking.  The  avoidance  of  such  deadlocks  as  well 
as  trivial  deadlocks  causes  the  Five  Color  protocol  to  have  a  lower  chance  of 
deadlocks.  Thus  we  expect  more  throughput. 

It  seems  that  the  Five  Color  protocol  should  perform  better  than  the  2-phase 
locking  protocols  due  to  the  above  advantages.  Here  we  present  some  characteristic 
situations  that  illustrate  some  situation  where  the  Five  Color  protocol  may  has 
advantages  over  two  variants  of  the  2-phase  locking  protocol.  The  examples  consist 
of  an  input  logs,  and  the  corresponding  outputs  produced  by  three  concurrency  con- 
trol strategies.  The  strategies  are: 

A:     The  standard  2-phase  locking  protocol. 

B:  A  2-phase  locking  variant,  in  which  the  read  locks  are  acquired  when  the  tran- 
saction reads,  the  transaction  writes  are  kept  in  local  storage,  and  write  locks 
are  acquired  at  commit  time. 

C:     The  Five  Color  Protocol. 


32 


input  log  1 

Wj(x)      R-jCx)     Wj(y) 

A 

W^(x)     Wj(y)     R^(x). 

B 

Rt(x)     Wj(x)     W^(y). 

C 

R^Cx)    W^(x)    Wj(y). 

input  log  2 

R^(x)     W^(x)     R2(x)     W,(x)     Wj(y) 

A 

Rj(x)     Wj(x)     Wj(y)     R2(x)     W2(x) 

B 

Deadlock 

C 

Rj(x)     Wj(x)     Wj(y)     R^Cx)     W^(x) 

input  log  3 

W^(y)     R2(x)     W^Cx)     Wj(x)     R2(y) 

A 

Deadlock 

B 

Deadlock 

C 

Wj(x)     Wj(y)     R2(x)     R2(y)     W2(x) 

For  input  log  1,  A)  delays  T-,  unnecessarily  while  B)  and  C)  produce  the  same 
result.  Input  log  2  causes  B)  to  enter  a  trivial  deadlock.  Both  input  logs  are  serializ- 
able. 

Input  log  3  is  not  serializable.  Both  A)  and  B)  avoids  illegal  executions  by 
causing  deadlocks.  However  C)  accepts  the  log,  and  rearranges  the  steps  to  provide 
a  serializable  output.  The  rearranging  of  the  write  steps  are  illustrative  of  the  order 
of  upgrading  of  Yellow  locks  to  Red  locks.  T-,  is  somewhat  delayed,  but  that  is  more 
acceptable  than  a  deadlock. 

These  are  just  a  few  examples  to  show  cases  where  C)  outperforms  2-phase 
locking.  However  it  makes  clear  why  comparisons  get  complicated  and  cannot  be 
handled  with  the  theoretical  methods  available  to  us  at  the  moment. 


7.2   Simulation  Results 
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Until  now  the  claims  of  higher  concurrency  and  lower  deadlock  rates  of  the 
Five  Color  protocol  were  made  based  on  intuitive  reasoning.  As  a  next  step  we 
present  simulation  results  that  compare  the  performance  of  the  Five  Color  protocol 
against  the  2-phase  locking  protocol  and  against  the  performance  of  a  database 
which  uses  no  concurrency  control  [Da84].  Using  no  concurrency  control  of  course 
leads  to  incorrect  executions,  but  it  defines  the  upper  bound  on  performance  of  con- 
currency control  protocols. 

Some  of  the  results  obtained  in  the  simulations  are  described  briefly  in  this  sec- 
tion. We  tested  the  protocols  for  throughput  and  abort  rates  at  low  to  high  loads  (5 
to  20  simultaneous  transactions)  and  low  and  high  conflict  rates  (by  varying  the  read 
to  write  ratios  and  the  number  of  data  items  in  the  database.) 

As  far  as  throughput  is  concerned,  at  low  load  and  low  conflict  cases,  the  2- 
phase  locking  protocol  was  about  6%  poorer  than  no  concurrency  control,  and  the 
Five  Color  protocol  was  about  3%  poorer  than  2-phase  locking. 

At  medium  load  and  low  conflict  situations,  the  Five  Color  protocol  was  about 
12%  poorer  than  no  concurrency  control  and  the  2-phase  locking  protocol  trailed  the 
Five  Color  protocol  by  about  9% . 

At  medium  load  high  conflict  situations  the  Five  Color  was  down  by  23%  from 
no  concurrency  control.  The  2-phase  locking  protocol  was  plagued  by  thrashing  due 
to  trivial  deadlocks  and  trailed  the  Five  Color  protocol  between  10%  to  20%  for 
various  test  runs. 

At  high  conflict  and  high  load  situations,  the  Five  Color  was  worse  than  no  con- 
currency control  by  36%  and  the  2-phase  locking  protocol  was  lower  than  the  Five 
Color  protocol  by  about  35%  to  40%. 

The  Five  Color  protocol  did  very  well  as  far  as  avoiding  deadlocks  and  having 
low  abort  rates.  Without  going  into  a  case  by  case  analysis,  the  Five  Color  Protocol 
had  about  one-fifth  the  number  of  aborts  as  2-phase  locking  at  low  load  and  about 
one-tenth  the  number  at  high  loads.  The  2-phase  locking  protocol  was  plagued  by  a 
large  number  of  trivial  deadlocks  and  cyclic  aborts  at  medium  to  high  loads. 

The  only  situation  where  the  Five  Color  protocol  was  (marginally)  poorer  than 
the  2-phase  locking  protocol  was  when  the  load  was  quite  low.  This  was  due  to  the 
■'bursty"  way  the  Five  Color  protocol  does  I/O.  For  each  transaction,  all  the  data 
items  are  read  at  the  onset  and  then  all  the  data  items  are  written  at  the  end.  While 
the  transaction  executes  it  consumes  only  CPU  cycles.  This  makes  the  system  oscil- 
late between  an  I/O  bound  state  and  a  CPU  bound  state.  We  noticed  this 
phenomenon  by  checking  the  CPU  and  I/O  queues.  When  there  are  several  transac- 
tions running  concurrently,  the  different  times  at  which  each  transaction  starts  and 
terminates  does  even  out  the  large  I/O  and  CPU  bursts,  but  at  low  loads  there  are 
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not  enough  transactions  to  give  rise  to  a  well  balanced  system.  The  2-phase  locking 
protocol  on  the  other  hand  allows  transactions  to  read  data  items  interspersed  with 
processing  and  thus  has  a  better  system  performance  at  low  loads. 

Overall,  the  simulation  results  well  support  our  claim  of  better  performance 
and  lower  abort  rates  for  the  Five  Color  protocol. 

7.3  Conclusions 

We  have  presented  a  locking  protocol  that  uses  an  unconventional  locking  stra- 
tegy, and  knowledge  about  the  read  and  write  sets  of  the  transactions  to  allow  non- 
2-phase  locking  on  a  general  database.  We  show  that  this  protocol  ensures  serializa- 
bility  and  allows  provides  higher  concurrency  than  the  2-phase  locking  protocol. 

The  Five  Color  protocol  is  substantially  more  complicated  than  the  2-phase 
locking  protocol.  In  fact  the  simplicity  and  elegance  of  the  2-phase  locking  protocol 
is  one  of  its  major  attractions.  Nevertheless,  we  believe  that  the  Five  Color  protocol 
is  worth  serious  consideration. 

The  overhead  involved  in  running  the  Five  Color  protocol  comprises  of  some- 
what increased  CPU  based  processing.  Since  database  applications  are  I/O  inten- 
sive, database  systems  have  significant  I/O  traffic  and  not  enough  CPU  traffic.  The 
extra  overhead  caused  by  the  Five  Color  protocol  algorithms  is  CPU  based.  All  the 
lock  tables  and  sets  can  be  stored  in  memory,  and  the  protocol  could  use  the  CPU 
cycles  that  would  otherwise  be  idle  due  to  I/O  delays.  The  the  extra  overhead  is 
hence  not  expected  to  cause  reduction  in  system  throughput. 

Thus  we  conclude  that  it  is  possible  for  the  Five  Color  Protocol  to  achieve  more 
concurrency  and  by  anticipating  the  absence  of  conflicts  in  a  large  number  of  cases, 
due  to  the  information  available  to  the  transaction  manager  about  the  transaction 
readsets  and  wnteset. 
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